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Abstract 

This paper addresses the problem of estimating the Potts parameter jointly with the unknown 
parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm. Standard 
MCMC methods cannot be applied to this problem because performing inference on /3 requires 
computing the intractable normalizing constant of the Potts model. In the proposed MCMC method 
the estimation of /3 is conducted using a likelihood-free Metropolis-Hastings algorithm. Experimental 
results obtained for synthetic data show that estimating (3 jointly with the other unknown parameters 
leads to estimation results that are as good as those obtained with the actual value of /?. On the other 
hand, assuming that the value of (3 is known can degrade estimation performance significantly if this 
value is incorrect. To illustrate the interest of this method, the proposed algorithm is successfully 
applied to real bidimensional SAR and tridimensional ultrasound images. 

Index Terms 

Potts-Markov field, Mixture model, Bayesian estimation, Gibbs sampler, Intractable normalizing 
constants. 

I. Introduction 

Modeling spatial correlation in images is fundamental in many image processing appli- 
cations. Markov random fields (MRF) have been recognized as efficient tools for capturing 
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these spatial correlations |TJ-[|6j. One particular MRF often used for Bayesian classification 
and segmentation is the Potts model, which generalizes the binary Ising model to arbitrary 
discrete vectors. The amount of spatial correlation introduced by this model is controlled by 
the so-called granularity coefficient (3. In most applications this important parameter is set 
heuristically by cross-validation. 

This paper studies the problem of estimating the Potts coefficient (3 jointly with the other 
unknown parameters of a standard Bayesian image classification or segmentation problem. 
More precisely, we consider Bayesian models defined by a conditional observation model 
with unknown parameters and a discrete hidden label vector z whose prior distribution is a 
Potts model with hyperparameter (3 (this Bayesian model is defined in Section [Tl]). From a 
methodological perspective, inference on (3 is challenging because the distribution f(z, (3) 
depends on the normalizing constant of the Potts model (hereafter denoted as C((3)), which is 
generally intractable. This problem has received some attention in the recent image processing 
literature, as it would lead to fully unsupervised algorithms |7J-[[TTJ. 

In this work we focus on the estimation of (3 within a Markov chain Monte Carlo (MCMC) 



algorithm that handles 2D or 3D data sets [12|-[16|. MCMC methods are powerful tools to 
handle Bayesian inference problems for which the minimum mean square error (MMSE) or 
the maximum a posteriori (MAP) estimators are difficult to derive analytically. MCMC meth- 
ods generate samples that are asymptotically distributed according to the joint posterior of the 
unknown parameters. These samples are then used to approximate the Bayesian estimators. 
However, standard MCMC methods cannot be applied directly to Bayesian problems based 
on the Potts model. Indeed, inference on (3 requires computing the normalizing constant of 
the Potts model C{(3), which is generally intractable. Specific MCMC algorithms have been 
designed to estimate Markov field parameters in [17|, [18| and more recently in [|7), [[8j. 
A variational Bayes algorithm based on an approximation of C{(3) has also been recently 
proposed in |9}. Maximum likelihood estimation of (3 within expectation-maximization (EM) 
algorithms has been studied in [ fTU| , [11|, jT9|. The strategies involved in these works for 
avoiding computing the normalizing constant C(/3) are summarized below. 



A. Pseudo-likelihood estimators 

One possibility to avoid the computation of C(f3) is to eliminate it from the posterior 
distribution of interest. More precisely, one can think of defining a prior distribution f((3) 
such that the normalizing constant cancels out from the posterior (i.e., f((3) oc C(/3)l^+ (/?)), 
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resulting in the so-called pseudo-likelihood estimators p0|. Although analytically convenient 



this approach generally does not lead to a satisfactory posterior density and results in poor 



estimation [21 1. Also, as noticed in [18| such a prior distribution generally depends on the 
data since the normalizing constant C{/3) depends implicitly on the number of observations 
(priors that depend on the data are not recommended in the Bayesian paradigm p2| , p. 36]). 

B. Approximation ofC(/3) 

Another possibility is to approximate the normalizing constant C((3). Existing approxima- 
tions can be classified into three categories: based on analytical developments, on sampling 
strategies or on a combination of both. A survey of the state-of-the-art approximation methods 



up to 2004 has been presented in [ JT8J . The methods considered in |T8J are the mean field, 
the tree- structured mean field and the Bethe energy (loopy Metropolis) approximations, as 
well as two sampling strategies based on Langevin MCMC algorithms. More recently, exact 
recursive expressions have been proposed to compute C(/3) analytically [9|. However, to our 
knowledge, these recursive methods have only been successfully applied to small problems 
(i.e., for MRFs of size smaller than 40 x 40) with reduced spatial correlation (3 < 0.5. 
Another sampling-based approximation consists in estimating C(/3) by Monte Carlo in- 



tegration [23, Chap. 3], at the expense of very substantial computation and possibly biased 



estimations (bias arises from the estimation error of C(/3)). Better results can be obtained 



by using importance or path sampling methods [24|. These methods have been applied to 
the estimation of (3 within an MCMC image processing algorithm in [17]. Although more 
precise than Monte Carlo integration, approximating C(/3) by importance or path sampling 
still requires substantial computation and is generally unfeasible for large fields. This has 
motivated recent works that reduce computation by combining importance sampling with 
analytical approximations. More precisely, approximation methods that combine importance 
sampling with extrapolation schemes have been proposed for the Ising model (i.e., a 2-state 
Potts model) in and for the 3-state Potts model in (8). However, we have found that this 



extrapolation technique introduces significant bias [25 1 



C. Auxiliary variables and perfect sampling 

Recent works from computational statistics have established that it is possible to avoid 



computing C(/3) within a Metropolis-Hastings MCMC algorithm [23 1 by introducing care 



fully selected auxiliary random variables [26|, [27|. In the work of Moller et. al. [26|, an 



July 24, 2012 DRAFT 



4 



auxiliary vector w distributed according to the same distribution as the label vector z (i.e., 
f(z\P)) is introduced. Metropolis-Hastings algorithms that do not require computing C((3) 
are then proposed to sample the joint distribution /(/?, w\z), which admits the exact desired 
posterior density f(/3\z) as marginal distribution [26]. Unfortunately this method suffers 
from a very low acceptance ratio that degrades severely as the dimension of z increases, 



and is therefore unsuitable for image processing applications [25|. Novel auxiliary variable 
methods with considerably better acceptance ratios have been proposed in p7| by using 
several auxiliary vectors and sequential Monte Carlo samplers [|28|. These methods could be 
interesting for estimating the Potts coefficient (3. However they will not be considered in this 
work because they require substantial computation and are generally too costly for image 
processing applications. An alternative auxiliary variable method based on a one-sample 
estimator of the ratio §^ has been proposed in [29| and recently been improved by using 



several auxiliary vectors and sequential Monte Carlo samplers in |30| (the ratio arises 



CM 

in the MCMC algorithm defined in Section III-C[ ). More details on the application of [29 1 



to the estimation of the Potts coefficient (3 are provided in a separate technical report [25 1 



D. Likelihood-free methods 

Finally, it is possible to avoid computing the normalizing constant C(/3) by using likelihood- 
free MCMC methods [[3TJ. These methods substitute the evaluation of intractable likelihoods 
within a Metropolis-Hastings algorithm by a simulation-rejection scheme. More precisely, 
akin to the auxiliary variable method p6| , an auxiliary vector w distributed according to 
the likelihood f(z\/3) is introduced. Two-step Metropolis-Hastings algorithms that do not 
require evaluating f(z\/3) (nor C(f3)) can then be considered to generate samples that are 



asymptotically distributed according to the exact posterior distribution f(/3\z) [31 1. Although 
generally unfeasible^ these exact methods have given rise to the approximative Bayesian com- 
putation (ABC) framework [32|, which studies likelihood-free methods to generate samples 
from approximate posterior densities f e (/3\z) ~ f(/3\z) at a reasonable computational cost. 
To our knowledge these promising techniques, that are increasingly regarded as "the most 



satisfactory approach to intractable likelihood problems" [32|, have not yet been applied to 
image processing problems. 

'in spite of being theoretically correct, exact likelihood-free algorithms suffer from several major shortcomings that make 



them generally impractical (see Section IV for more details) 
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The main contribution of this paper is to propose an ABC MCMC algorithm for the 
joint estimation of the label vector z, the granularity coefficient (5 and the other unknown 
parameters of a Bayesian model. The estimation of (3 is included within an MCMC algorithm 
through an ABC method particularly adapted to the Potts model and to large data sets. It is 
shown that the estimation of f3 can be easily integrated to existing MCMC algorithms where 
f3 was previously assumed known. Applications to large 2D and 3D images illustrate the 
performance of the proposed method. T 

The remainder of the paper is organized as follows: Bayesian models considered in this 
work are defined in Section II. Section III describes a generic hybrid Gibbs sampler to gen- 
erate samples asymptotically distributed according to the approximate posterior distribution 
of these Bayesian models. The estimation of (5 using a likelihood-free algorithm is discussed 
in detail in Section IV. Experiments on synthetic and real data are presented in Sections V 
and VI respectively. Conclusions are finally reported in Section VI. 

II. Bayesian Model 

Let r n e M + denote the nth observation, or voxel, in a lexicographically vectorized image 
r — (ri, . . . , r N ) T e M. N . We assume that r is made up by multiple regions, characterized by 
their own statistics. More precisely, r is assumed to be associated with K stationary classes 
{Ci, . . . , Ck} such that the observations in the kth class are fully described by the following 
conditional observation model 

r n \z n = k ~ / (r n \6 k ) (1) 

where / (r n \6 k ) denotes a generic observation model with parameter vector k characterizing 
the class Finally, a label vector z = (zi, . . . , zn) is introduced to map observations r 
to classes Ci, . . . ,C K (i.e., z n = k if and only if r n e C k ). 

Several works have established that a Potts model can be used to enhance the fact that 
the probability P[z n — k] of a given voxel is related to the probabilities of its neighbors. 
As explained previously, the amount of spatial correlation between adjacent image pixels 
introduced by the Potts model is controlled by the granularity coefficient f3. Existing image 
classification and segmentation methods have mainly studied the estimation of the class 
parameter vector 6 = {d\ , . . . , #^) T and the label vector z conditionally to a known value 
of (3. However, setting (3 incorrectly can degrade the estimation of 6 and z significantly. 
Moreover, fixing the value of f3 a priori is difficult because different images can have different 
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spatial organizations. This paper considers the problem of estimating the unknown parameter 
vectors and z jointly with (3. This problem is formulated in a Bayesian framework which 
requires to define the likelihood of the observation vector r and the priors for the unknown 
parameters 6, z and (3. 

A. Likelihood 

Assuming that the observations r n are independent conditionally to the label vector z, the 
likelihood function associated with the image r is 

K 

f(r\0,z,P) = f(r\0,z) = H J] f(r n \O k ) (2) 

k=l {n\z n =k} 

where f(r n \0k) is the generic probability density function associated with the observation 
model introduced in ([T]). 

B. Parameter priors 

1 ) Labels: It is natural to consider that there are some correlations between the character- 
istics of a given voxel and those of its neighbors. Since the seminal work of Geman [[TJ, MRFs 
have become very popular to introduce spatial correlation in images. MRFs assume that the 
distribution of a pixel conditionally to all other pixels of the image equals the distribution of 
this pixel conditionally to its neighbors. Consequently, it is important to properly define the 
neighborhood structure. The neighborhood relation between two pixels (or voxels), i and j, 
has to be symmetric: if i is a neighbor of j then j is also a neighbor of i. There are several 
neighborhood structures that have been used in the literature. In the bidimensional case, 
neighborhoods defined by the four or eight nearest voxels represented in Fig. [1] are the most 
commonly used. Similarly, in the tridimensional case the most frequently used neighborhoods 
are defined by the six or fourteen nearest voxels represented in Fig [2| In the rest of this paper 
4-pixel and 6-voxel neighborhoods will be considered for 2D and 3D images, respectively. 
Therefore, the associated set of neighbors, or cliques, have vertical, horizontal and depth 
configurations (see [1] for more details). 

Once the neighborhood structure has been established, the MRF can be defined. Let z n 
denote the random variable indicating the class of the nth image voxel. The whole set 
of random variables z\, Z2, ■ ■ ■ , forms a random field. An MRF is obtained when the 
conditional distribution of z n given the other pixels z_ n = (zi, . . . , z n _i, z n+1 , . . . , z^) only 
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Fig. 1. 4-pixel (left) and 8-pixel (right) neighborhood structures. The pixel considered appears as a void red circle whereas 
its neighbors are depicted in full black and blue. 




Fig. 2. 6-voxel (left) and 14-voxel (right) neighborhood structures. The considered voxel appears as a void red circle 
whereas its neighbors are depicted in full black and blue. 



depends on its neighbors 2y(n)> i- e -> 

/ {z n \z- n ) = f (z n \z v(n) ) (3) 

where V(n) is the index set of the neighbors of the nth voxel, z_„ denotes the vector z 
whose nth element has been removed and Zyt n ) is me sub-vector of z composed of the 
elements whose indexes belong to V(n). 

In the case of K classes, the random variables zx, z 2 , . . . , z N take their values in the finite 
set {1, . . . , K}. The resulting MRF (with discrete values) is a Potts-Markov field, which 
generalizes the binary Ising model to arbitrary discrete vectors. In this study 2D and 3D 
Potts-Markov fields will be considered as prior distributions for z. More precisely, 2D MRFs 
are considered for single-slice (2D) images whereas 3D MRFs are investigated for multiple- 
slice (3D) images. Note that Potts-Markov fields are particularly well suited for label-based 



segmentation as explained in [33 1. By the Hammersley-Clifford theorem the corresponding 
prior for z can be expressed as follows 

f{z\fl = -Le*p[M*)) ( 4 ) 

where 

N 

®P( Z ) = Y1 2 0S(Zn-Z n ,) (5) 
n=l n'eV(n) 
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and where 8(-) is the Kronecker function, (3 is the granularity coefficient and C(f3) is the 
normalizing constant or partition function [ |34| 

C{fi)= (6) 
ze{i,...,K} n 

As explained previously, the normalizing constant C{(3) is generally intractable even for 
K = 2 because the number of summands in ([6]) grows exponentially with the size of z p5| . 
The hyperparameter (3 tunes the degree of homogeneity of each region in the image. A small 
value of (3 induces a noisy image with a large number of regions, contrary to a large value 
of (3 that leads to few and large homogeneous regions. Finally, it is interesting to note that 
despite not knowing C{f3), drawing labels z = (zi, . . . , zn) from the distribution Q can 
be easily achieved by using a Gibbs sampler p3| . 

It is interesting to mention that while the Potts model is an effective means to introduce 
spatial correlation between discrete variables, there are other more complex models that 
could be investigated. In particular, Marroquin et al. [[36) have shown that in segmentation 
applications better results may be obtained by using a two-layer hidden field, where hidden 
labels are assumed to be independent and correlation is introduced at a deeper layer by a 
vectorial Markov field. Similarly, Woolrich et al. [37] have proposed to approximate the 
Potts field by modeling mixture weights with a Gauss-Markov random field. However, these 
alternative models are not well adapted for 3D images because they require significantly 
more computation and memory resources than the Potts model. These overheads result from 
the fact that they introduce (K + 1)N and KN hidden variables respectively, against only 
N for the Potts model (N being the number of image pixels and K the number of discrete 
states of the model). 

2) Parameter vector 6: Assuming a priori independence between the parameters 6%, . . . , K , 
the joint prior for the parameter vector 6 is 

K 

f(6) = Hf(6 k ) (7) 

fc=i 

where f(6 k ) is the prior associated with the parameter vector 6 k which mainly depends on 
the application considered. Two examples of priors / (0) will be investigated in Section |vj 

3) Granularity coefficient (3: As explained previously, fixing the value of (3 a priori can be 
difficult because different images usually have different spatial organizations. A small value 
of (3 will lead to a noisy classification and degrade the estimation of 6 and z. Setting (3 to 
a too large value will also degrade the estimation of 6 and z by producing over- smoothed 
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classification results. Following a Bayesian approach, this paper proposes to assign (3 an 
appropriate prior distribution and to estimate this coefficient jointly with (0, z). In this work, 
the prior for the granularity coefficient (3 is a uniform distribution on (0, B) 

m=u { o, B )ifi) (8) 

where B = 2 represents the maximum possible value of f3. Note that it is unnecessary to 
consider larger values of B since, for the first order neighborhood structure, "when (3 = 2, 
the Potts-Markov model is almost surely concentrated on single-color images" [ |38| p. 30]. 

C. Posterior Distribution of (0, z, /3) 

Assuming the unknown parameter vectors 0, z, (3 are a priori independent and using Bayes 
theorem, the posterior distribution of (0, z, (3) can be expressed as follows 

/ (0, z, (3\r) ex f(r\e, z)f(0)f(z\(3)f((3) (9) 

where oc means "proportional to" and where the likelihood f(r\6, z) has been defined in d2b 
and the prior distributions f(0), f(z) and f((3) in (Q, Q and ([8]) respectively. Unfortunately 
the posterior distribution (|9]) is generally too complex to derive the MMSE or MAP estimators 
of the unknown parameters 0, z and (3. One can think of using the EM algorithm to estimate 
these parameters. Indeed the EM algorithm has received much attention for mixture problems 



|39|. However, the shortcomings of the EM algorithm include convergence to local maxima 
or saddle points of the log-likelihood function and sensitivity to starting values [HO* p. 259]. 
An interesting alternative consists in using an MCMC method that generates samples that are 



asymptotically distributed according to the target distribution (M) [23|. The generated samples 



are then used to approximate the Bayesian estimators. This strategy has been used successfully 



in several recent image processing applications (see [13|, [14|, [41|-[{45| for examples in 
image filtering, dictionary learning, image reconstruction, fusion and segmentation). Many 
of these recent MCMC methods have been proposed for Bayesian models that include a Potts 



MRF (T2J, jT3J, (T5J, [|T6J, |43J. However, these methods only studied the estimation of and 
z conditionally to a known granularity coefficient (3. The main contribution of this paper is 
to study Bayesian algorithms for the joint estimation of 0, z and (3. The next section studies 
a hybrid Gibbs sampler that generates samples that are asymptotically distributed according 
to the posterior ([9]). The samples are then used to estimate the granularity coefficient (3, 
the image labels z and the model parameter vector The resulting sampler can be easily 
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adapted to existing MCMC algorithm where (5 was previously assumed known, and can be 
applied to large images, both in 2D and in 3D. 

III. Hybrid Gibbs Sampler 

This section studies a hybrid Metropolis-within-Gibbs sampler that generates samples that 
are asymptotically distributed according to (|9]). The conventional Gibbs sampler successively 
draws samples according to the full conditional distributions associated with the distribution 
of interest (here the posterior (|9])). When a conditional distribution cannot be easily sampled, 
one can resort to a Metropolis-Hastings (MH) move, which generates samples according to 
an appropriate proposal and accept or reject these generated samples with a given probability. 
The resulting sampler is referred to as a Metropolis-within-Gibbs sampler (see [231 for more 
details about MCMC methods). The sampler investigated in this section is based on the 
conditional distributions P[z|0, /3, r), f(6\z, (3,r) and f(/3\6,z,r) that are provided in the 
next paragraphs (see also Algorithm [T] below). 



Algorithm 1 Proposed Hybrid Gibbs Sampler 



l: Input: initial {0 (o) , z (0) , /3 (0) }, number of iterations T. 

2: for t = 1 to T do 

3: Generate 2 W ~ P[z|0 ( * -1) , z^ 1 ^, ^ t ' t \ r] according to ([12]) 
4: Generate {t) ~ z®, ^ t ~ 1) , r) according to @3) 

5: Generate (5^ ~ f(/3\0 (t \ z {t \ (5^- l \ r) using Algorithm || 

6: end for 



A. Conditional probability P[z\6, (3,r] 

For each voxel n & {1,2, ... , N}, the class label z n is a discrete random variable whose 
conditional distribution is fully characterized by the probabilities 

P [z n = k\z_ n , k , r n , f}\ oc f{r n \O k , z n = k)P [z n \z v(n) , (3] (10) 

where k — 1, . . . , K, and where it is recalled that V(n) is the index set of the neighbors of 
the nth voxel and K is the number of classes. These probabilities can be expressed as 

P [z n = k\z V ( n ),0 k ,{3,r n ] oc 7r njfc (11) 
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■K„ k = exp 



f(r n \6 k ,z n = k). 



with 

n'6V(n) 

Once all the quantities 7r n fc, = have been computed, they are normalized to 

obtain the probabilities Ti n ^ = P \z n = k\zy( n ), k , (3, r n ] as follows 



(12) 



Note that the probabilities of the label vector z in (y~2j) define an MRF. Sampling from this 
conditional distribution can be achieved by using a Gibbs sampler J23| that draws discrete 



values in the finite set {1, . . . , K} with probabilities ( fT2| ). More precisely, in this work z 
has been sampled using a 2-color parallel chromatic Gibbs sampler that loops over n 6 
{1,2, ... ,N} following the checkerboard sequence |46| . 

B. Conditional probability density function f(6\z, (3,r) 

The conditional density f(0\z,/3,r) can be expressed as follows 

f(0\z, 0, r) = f(0\z, r) oc f(r\0, z)f(8) (13) 

where f(r\6,z) and f(6) have been defined in §2§ and Q. Generating samples distributed 



according to ( [13] ) is strongly problem dependent. Some possibilities will be discussed in 
Sections and [Vl| Generally, 6 = {e[, T K ) T can be sampled coordinate -by-coordinate 
using the following Gibbs moves 



fe ~/(0 fc |r, Z )cx J] f{r n \6 k )f{0 k ), k = l,...,K. 

{n\z n =k} 



(14) 



In cases where sampling the conditional distribution ( [14] ) is too difficult, an MH move can 



be used resulting in a Metropolis-within-Gibbs sampler [23| (details about the generation 
of samples k for the problems studied in Sections [V] and [VI] are provided in a separate 
technical report p5|). 

C. Conditional probability density function f((3\6,z,r) 
From Bayes rule, the conditional density f(/3\0,z,r) can be expressed as follows 

f(0\O,z,r)=f(0\z)<xf(z\P)f(J3) (15) 

where f(z\/3) and f((3) have been defined in ([4]) and ([8]) respectively. The generation of 
samples according to f(/3\6,z,r) is not straightforward because f{z\(5) is defined up to the 
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unknown multiplicative constant that depends on (3. One could think of sampling (3 by 
using an MH move, which requires computing the acceptance ratio 

ratio = min {1, £} (16) 

with 

5 f(z\f3(t-V)f(P«-V)q(f3*\p(t-V) 1 ; 

where (3* ~ denotes an appropriate proposal distribution. By replacing Q into 

< [T7| ), £ can be expressed as 

C^*- 1 )) exp [<!>(*)] /(g) g(g^|g) 

where /3* denotes the proposed value of /3 at iteration t and is the previous state of 



the chain. Unfortunately the ratio (18) is generally intractable because of the term 



cm 



The next section presents a likelihood-free MH algorithm that samples (3 without requiring 
to evaluate f(z\(3) and C{(3). 

IV. Sampling the granularity coefficient 

A. Likelihood-free Metropolis-Hastings 

It has been shown in pT| that it is possible to define a valid MH algorithm for posterior 
distributions with intractable likelihoods by introducing a carefully selected auxiliary variable 
and a tractable sufficient statistic on the target density. More precisely, consider an auxiliary 
vector w defined in the discrete state space {1, ... ,K} N of z generated according to the 
likelihood f(z\f3), i.e., 

w ~ f(w\P) 4 exp [$f,(w)] (19) 

Also, let r)(z) be a tractable sufficient statistic of z, i.e., f((3\z) = f[(3\i](z)]. Then, it 
is possible to generate samples that are asymptotically distributed according to the exact 
conditional density f((3\0,z,r) = f(/3\z) by introducing an additional rejection step based 



on r](z) into a standard MH move pT| (see Algorithm [2] below). 

fin 



Note that the MH acceptance ratio in algorithm j^zj is the product of the prior ratio 



/(/3(*-i)) 

and the proposal ratio ^^^(H)] - The generally intractable likelihood ratio j/^0t=T)\ has 
been replaced by the simulation and rejection steps involving the discrete auxiliary vector 
w. Despite not computing f^ffl-V) explicitly, the resulting MH move still accepts candidate 



values (3* with the correct probability (fl6|) [31 1 
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Algorithm 2 Exact likelihood-free MH step pT| 



l: Input: {^ t ~ l \z^} 

2: Generate 0* ~ q (l3*\/3^) 

3: Generate an auxiliary variable w ~ f(w\p* 

4: if r](w) = rj(z^) then 

V Srt r.ntin - g^'" 1 ^*) 

3. IdllU - /(/ g(t-l)) g(^.| j 9(t-l)) 

6: Draw u ~ W(o,i) 
7: if (u < ratio) then 
8: Set p® = p* 
9: else 

10: Set 0® = p«-V 
ii: end if 
12: else 

13: Set /3« = pM 
14: end if 



Unfortunately exact likelihood-free MH algorithms have several shortcomings [32]. For 
instance, their acceptance ratio is generally very low because candidates p* are only accepted 
if they lead to an auxiliary vector w that verifies r](z^) = 77(10). In addition, most Bayesian 
models do not have known sufficient statistics. These limitations have been addressed in the 
ABC framework by introducing an approximate likelihood-free MH algorithm (henceforth 
denoted as ABC-MH) pT| . Precisely, the ABC-MH algorithm does not require the use of a 
sufficient statistic and is defined by a less restrictive criterion of the form p [r)(z^), i](w)~\ < 
e, where p is an arbitrary distance measure and e is a tolerance parameter (note that this 
criterion can be applied to both discrete and continuous intractable distributions, contrary 
to algorithm [2] that can only be applied to discrete distribution). The resulting algorithm 
generates samples that are asymptotically distributed according to an approximate posterior 



density [31 1 



UP\z) « ^/(/3)/(^|/3)l [p[j7(z))7?M]<e] H (20) 
whose accuracy depends on the choice of 77(2) and e (if 77(2) is a sufficient statistic and 



e = 0, then (20) corresponds to the exact posterior density). 



In addition, note that in the exact likelihood-free MH algorithm, the auxiliary vector w 
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has to be generated using perfect sampling p7|, [|48|. This constitutes a major limitation, 



since perfect or exact sampling techniques [47 1, [48 1 are too costly for image processing 
applications where the dimension of z and w can exceed one million pixels. A convenient 
alternative is to replace perfect simulation by a few Gibbs moves with target density f(w\/3*) 
as proposed in [ |49| . The accuracy of this second approximation depends on the number of 
moves and on the initial state of the sampler. An infinite number of moves would clearly 



lead to perfect simulation regardless of the initialization. Inspired from [50|, we propose to 



use z as initial state to produce a good approximation with a small number of moves. A 
simple explanation for this choice is that for candidates (3* close to the mode of f(/3\z), the 
vector z has a high likelihood f(z\/3). In other terms, using z as initial state does not lead 
to perfect sampling but provides a good final approximation of f(/3\z) around its mode. The 
accuracy of this approximation can be easily improved by increasing the number of moves at 



the expense of computing time. However, several simulation results in [25|, [ 30 1 have shown 



that the resulting ABC algorithm approximates f(/3\z) correctly even for a small number of 
moves. 

B. Choice of r\[z), p and e 

As explained previously, ABC algorithms require defining an appropriate statistic r](z), 
a distance function p and a tolerance level e. The choice of r/(z) and p are fundamental 
to the success of the approximation, while the value of e is generally less important [ |3"2| . 
Fortunately the Potts MRF, being a Gibbs random field, belongs to the exponential family 



and has the following one-dimensional sufficient statistic [32 1, [49] 



N 



(21) 



- 6 ( Z n-Zn>) 
n=l n'SV(n) 

where it is recalled that V(n) is the index set of the neighbors of the nth voxel. Note that 
because (|2~Tj) is a sufficient statistic, the approximate posterior f € (j3\z) tends to the exact 



posterior f(/3\z) as e — > [31 1. 

The distance function p considered in this work is the one-dimensional Euclidean distance 



p[r](z),T](w)} = \r)(z) -r)(w)\ 



(22) 



which is a standard choice in ABC methods [32|. Note from (21 ) and (22) that the distance 



p[-, ■] between i](z) and r/(w) reduces to the difference in the number of active cliques in 
z and w. It is then natural to set the tolerance as a fraction of that number, i.e., e = vr\{z) 
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(y = j^jq will be used in our experiments). Note that the choice of v is crucial when the 
prior density f((3) is informative because increasing v introduces estimation bias by allowing 
the posterior density to drift towards the prior [51]. However, in this work the choice of v 
is less critical because (3 has been assigned a flat prior distribution. 



C. Proposal distribution q((3* 1(3^ x) ) 

Finally, the proposal distribution used to explore the set (0,B) is chosen as 

a truncated normal distribution centered on the previous value of the chain with variance 

/T ~jv- ( o, B) (/^-^4). (23) 

where the variance si is adjusted during the burn-in period to ensure an acceptance ratio 



close to 5%, as recommended in [25 1. This proposal strategy is referred to as random walk 
MH algorithm p3| p. 245]. The choice of this proposal distribution has been motivated by 
the fact that for medium and large problems (i.e., Markov fields larger than 50 x 50 pixels) 
the distribution f((3\z) becomes very sharp and can be efficiently explored using a random 
walk. 

The resulting ABC MH method is summarized in Algorithm [3] below. Note that Algorithm 
[3] corresponds to step 5 in Algorithm [T] 

V. Experiments 

This section presents simulation results conducted on synthetic data to assess the impor- 
tance of estimating the hyperparameter (3 from data as opposed to fixing it a priori (i.e., 
the advantage of estimating the posterior p(0, z, (3\r) instead of fixing (3). Simulations have 
been performed as follows: label vectors distributed according to a Potts MRF have been 
generated using different granularity coefficients (in this work bidimensional fields of size 
256 x 256 pixels have been considered). Each label vector has in turn been used to generate an 
observation vector following the observation model ([T]). Finally, samples distributed according 
to the posterior distribution of the unknown parameters (6, z, (3) have been estimated from 
each observation vector using Algorithm [T] coupled with Algorithm [3] (assuming the number 
of classes K is known). The performance of the proposed algorithm has been assessed by 
comparing the resulting Bayesian estimates with the true values of the parameters. This 
paper presents simulation results obtained using two different mixture models. Additional 
simulation results using other mixture models are available in a separate technical report 
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Algorithm 3 ABC likelihood-free MH step (31 1 



l: Input: {/?(* x \ z^\u, s 2 ^}, number of moves M. 

2: Generate /3* ~ A/" (0)jB) (z^* -1 ), s|) 

3: Generate ~ /(iu|/3*) through M Gibbs moves with initial state 
4: if \r)(z^) — r)(w)\ < urj(z^) then 

V Srt mtin - g^'" 1 ^*) 

3. IdUO - /(/ g(t-l)) ^ ^(t-l)) 

6: Draw m ~ W(o,i) 

7: if (?i < ratio) then 

8: Set = p* 

9: else 

10: Set (3^ = 

ii: end if 

12: else 

13: Set /3« = /3(*-« 

14: end if 



(25| . Finally, comparisons with the state-of-the-art methods proposed in (8J, [26 1, [29 1 are 
also reported in (25J . 

A. Mixture of gamma distributions 

The first experiment considers a mixture of gamma distributions. This observation model 
is frequently used to describe the statistics of pixels in multilook SAR images and has 



been extensively applied for SAR image segmentation [52|. Accordingly, the conditional 



observation model ([T]) is defined by a gamma distribution with parameters L and m k (52 1 



TnW = k ~ f(r n \e k ) = (—) ^exp (-^) (24) 

\m k J T(L) V m k J 

where T(t) = J +o ° u t ~ 1 e~ u du is the standard Gamma function and L (the number of 

looks) is assumed to be known (L = 3 in this paper). The means m k (k = 1, . . . , K) are 

assigned inverse gamma prior distributions as in (52| . The estimation of (3, z and = m = 

(mi, . . . , rriK) T is then achieved by using Algorithm [I] The sampling strategies described in 



Sections III-A and IV can be used for the generation of samples according to P[z\m, f3,r] 
and f((3\m, z,r). More details about simulation according to f(m\z,/3,r) are provided in 
the technical report (25|. 
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The first results have been obtained for a 3-component gamma mixture with parameters 
m = (1; 2; 3). Fig. |3^a) shows the densities of the gamma distributions defining the mixture 
model. Note that there is significant overlap between the densities making the inference 
problem very challenging. For each experiment the MAP estimates of the class labels z 
have been computed from a single Markov chain of T = 1 000 iterations whose first 400 
iterations (burn-in period) have been removed. Table [j] shows the percentage of MAP class 
labels correctly estimated. The first column corresponds to labels that were estimated jointly 
with (3 whereas the other columns result from fixing (3 to different a priori values. To ease 
interpretation, the best and second best results for each simulation scenario in Table [j] are 
highlighted in red and blue. We observe that the proposed method performs as well as if (3 
was perfectly known. On the other hand, setting (3 to an incorrect value may severely degrade 
estimation performance. Table [IT] shows the MMSE estimates of (3 and m corresponding to 
the three simulations of the first column of Table [I] (proposed method) as well as the standard 
deviations of the estimates (results are displayed as [mean ± standard deviation]). We observe 
that these values are in good agreement with the true values used to generate the observation 
vectors. Finally, for illustration purposes, Fig. [4] shows the MAP estimates of the class labels 
corresponding to the simulation scenario reported in the last row of Table [TJ More precisely, 
Fig. |4ja) depicts the class label map, which is a realization of a 3-class Potts MRF with 
(3 = 1.2. The corresponding synthetic image is presented in Fig. |4jb). Fig. |4|c) shows the 
class labels obtained with the proposed method and Fig. |4jd) those obtained when (3 is 
perfectly known. Lastly, Figs. |4je)-(h) show the results obtained when f3 is fixed incorrectly 
to 0.6,0.8, 1.0 and 1.4. We observe that the classification produced by the proposed method 
is very close to that obtained by fixing (3 to its true value, whereas fixing (3 incorrectly results 
in either noisy or excessively smooth results. 




(a) gamma mixture (b) a-Rayleigh mixture 

Fig. 3. Probability density functions of the distributions mixed for the first set and the second set of experiments. 
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TABLE I 

Gamma Mixture: Class label estimation (K = 3) 









Correet classification with fixed 




Proposed method 


= 0.6 


/3 = 0.8 


= 1.0 


/3 = 1.2 


= 1.4 


True = 0.8 


= 0.80 


62.2% 


61.6% 


61.7% 


58.8% 


41.5% 


40.1% 


True = 1.0 


= 1.00 


77.9% 


67.3% 


73.4% 


77.7% 


75.9% 


74.2% 


True = 1.2 


g = l.is 


95.6% 


76.6% 


87.8% 


94.9% 


95.6% 


95.5% 



TABLE II 

Gamma Mixture: Parameter estimation 





true 


MMSE 


true 


MMSE 


true 


MMSE 





0.80 


0.80 ± 0.01 


1.00 


1.00 ± 0.01 


1.20 


1.18 ± 0.02 


mi 


1 


0.99 ± 0.02 


1 


1.00 ± 0.02 


1 


0.99 ± 0.03 


m 2 


2 


1.99 ± 0.02 


2 


1.98 ± 0.02 


2 


1.98 ± 0.07 


m 3 


3 


2.98 ± 0.03 


3 


2.98 ± 0.04 


3 


3.01 ± 0.03 



B. Mixture of a-Rayleigh distributions 

The second set of experiments has been conducted using a mixture of a-Rayleigh distri- 
butions. This observation model has been recently proposed to describe ultrasound images 



of dermis [53 1 and has been successfully applied to the segmentation of skin lesions in 
3D ultrasound images | fl6) . Accordingly, the conditional observation model ([T]) used in the 
experiments is defined by an a-Rayleigh distribution 

r n \z n = k ~ f{r n \0 k ) = p a n(r n \a k , j k ) (25) 

with 

POO 

p a ii(r n \a k ^ k ) = r n / Xexp[-(-f k X) ak ] J (r n X) dX 
Jo 

where a k and ^ k are the parameters associated with the /cth class and where J is the zeroth 
order Bessel function of the first kind. Note that this distribution has been also used to 



model SAR images in [54 1, [55]. The prior distributions assigned to the parameters a k and 
7^ (k = 1, . . . , K) are uniform and inverse gamma distributions as in [16]. The estimation 
of (5, z and 6 = (ct T , *y T ) T = (ai, . . . , ax, 7i, • ■ ■ , 1k) t is performed by using Algorithm 



[T] The sampling strategies described in Sections III-A and |IV] can be used for the generation 



of samples according to P[z\a,~y, (3,r] and f((3\ct, 7, z, r). More details about simulation 



according to f(ct\j, z, f3, r) and /(7|a, z, f3, r) are provided in the technical report [25 1 
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(a) True Labels 03 = 1.2) 




(c) Estimated /3 




(e) p = 0.6 




(g) p = 1.0 




(b) Observations 




(d) True $ = 1.2 




(f) /3 = 0. 




(h) = 1.4 



Fig. 4. Gamma mixture: estimated labels using the MAP estimators, (a) Ground truth, (b) observations, (c) 
proposed algorithm (estimated /3),(d) true /3 = 1.2, (e)-(h) fixed (3 = (0.6, 0.8, 1.0, 1.2, 1.4). 



The following results have been obtained for a 3 -component a-Rayleigh mixture with 
parameters a = (1.99; 1.99; 1.80) and 7 = (1.0; 1.5; 2.0). Fig. |3jb) shows the densities of 
the components associated with this a-Rayleigh mixture. Again, note that there is significant 
overlap between the mixture components making the inference problem very challenging. 
For each experiment the MAP estimates of the class labels z have been computed from 
a single Markov chain of T = 2 000 iterations whose first 900 iterations (burn-in period) 



have been removed. Table III shows the percentage of MAP class labels correctly estimated. 
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The first column corresponds to labels that were estimated jointly with f3 whereas the other 
columns result from fixing (3 to different a priori values. To ease interpretation, the best 



and second best results for each simulation scenario in Table III are highlighted in red and 
blue. We observe that even if the mixture components are hard to estimate, the proposed 
method performs similarly to the case of a known coefficient (3. Also, setting (3 incorrectly 



degrades estimation performance considerably. Table IV shows the MMSE estimates of (3, 
ex and 7 corresponding to the three simulations of the first column of Table [In] (proposed 
method). We observe that these values are in good agreement with the true values used to 
generate the observation vectors. To conclude, Fig. [5] shows the MAP estimates of the class 
labels corresponding to the simulation associated with the scenario reported in the last row 
of Table III More precisely, the actual class labels are displayed in Fig. |5^a), which shows 
a realization of a 3-class Potts MRF with (3 = 1.2. The corresponding observation vector is 
presented in Fig.^b). Fig.^c) and Fig.^d) show the class labels obtained with the proposed 
method and with the actual value of (3. Lastly, Figs. |5]^e)-(h) show the results obtained when 
(3 is fixed incorrectly to 0.6, 0.8, 1.0 and 1.4. We observe that the proposed method produces 
classification results that are very similar to those obtained when (3 is fixed to its true value. 
On the other hand, fixing (3 incorrectly generally leads to very poor results. 

TABLE III 

q-Rayleigh Mixture: Class label estimation (K = 3) 







Correct classification with p fixed 




Proposed method 


P = 0.6 


P = 0.8 


p = 1.0 


P = 1.2 


p = 1.4 


True P = 0.8 


P = 0.82 


56.5% 


52.3% 


56.3% 


44.8% 


33.3% 


33.4% 


True p = 1.0 


P = 1.01 


75.5% 


61.1% 


68.1% 


75.5% 


54.1% 


41.7% 


True P = 1.2 


P = 1.18 


95.0% 


67.7% 


83.1% 


94.4% 


94.8% 


69.5% 



VI. Application to real data 

After validating the proposed Gibbs sampler on synthetic data, this section presents two 
applications of the proposed algorithm to real data. 

A. Pixel classification of a 2D SAR image 

The proposed method has been applied to the unsupervised classification of a 2D multilook 
SAR image acquired over Toulouse, France, depicted in Fig. [6|a). This image was acquired 
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TABLE IV 

o-Rayleigh Mixture: Parameter estimation 





true 


MMSE 


true 


MMSE 


true 


MMSE 




0.80 


0.81 ± 0.013 


1.00 


1.01 ± 0.015 


1.20 


1.18 ± 0.021 


ai 


1.99 


1.98 ± 0.010 


1.99 


1.99 ± 0.010 


1.99 


1.99 ± 0.004 


71 


1.00 


1.00 ± 0.009 


1.00 


1.00 ± 0.009 


1.00 


1.00 ± 0.005 


Q2 


1.99 


1.99 ± 0.007 


1.99 


1.97 ± 0.008 


1.99 


1.99 ± 0.005 


72 


1.50 


1.47 ± 0.012 


1.50 


1.49 ± 0.010 


1.50 


1.50 ± 0.005 


CK3 


1.80 


1.80 ± 0.008 


1.80 


1.80 ± 0.006 


1.80 


1.79 ± 0.007 


73 


2.00 


2.02 ± 0.014 


2.00 


1.97 ± 0.017 


2.00 


2.00 ± 0.009 



by the TerraSAR-X satellite at lm resolution and results from summing 3 independent SAR 
images (i.e., L = 3). Potts MRFs have been extensively applied to SAR image segmentation 



using different observations models [19|, [56|-[58|. For simplicity the observation model 



chosen in this work is a mixture of gamma distributions (see Section V-A and the report [25 1 
for more details about the gamma mixture model). The proposed experiments were conducted 
with a number of classes K = 4 (setting K > 4 resulted in empty classes). Fig.|6jb) shows the 
results obtained with the proposed method. The MMSE estimate of the granularity coefficient 
corresponding to this result is (3 = 1.62 ± 0.05, which has enforced the appropriate amount 
of spatial correlation to handle noise and outliers while preserving contours. Fig. |6jc) shows 



the results obtained by fixing (3 = 1, as proposed in [57 1. These results have been computed 
from a single Markov chain of T = 5 000 iterations whose first 1 000 iterations (burn-in 
period) have been removed. Finally, for visual interpretation Fig. |6jd) shows the same region 
observed by an airborne optical sensor. We observe that the classification obtained with the 
proposed method has clear boundaries and few miss-classifications. 

B. Lesion segmentation in a 3D ultrasound image 

The proposed method has also been applied to the segmentation of a skin lesion in a 
dermatological 3D ultrasound image. Ultrasound-based lesion inspection is an active topic in 
dermatological oncology, where patient treatment depends mainly on the depth of the lesion 
and the number of skin layers it has invaded. This problem has been recently addressed 
using an a-Rayleigh mixture model ( |25| ) coupled with a tridimensional Potts MRF as prior 
distribution for the class labels | fl6) . The algorithm investigated in [ fT6| estimates the label 
vector and the mixture parameters conditionally to a known value of (3 that is set heuristically 
by cross-validation. The proposed method completes this approach by including the estimation 
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(a) True Labels (J3 = 1.2) (b) Observations 




(g) P = 1.0 (h) /3 = 1.4 

Fig. 5. a-Rayleigh mixture: MAP estimates of the class labels, (a) Ground truth, (b) observations, (c) proposed 
algorithm (estimated /3),(d) true j3 = 1.2, (e)-(h) fixed /3 = (0.6, 0.8, 1.0, 1.2, 1.4). 

of (3 into the segmentation problem. Some elements of this model are recalled in the technical 
report (15). 

Fig. [T^a) shows a 3D B-mode ultrasound image of a skin lesion, acquired at 100MHz 
with a focalized 25MHz 3D probe (the lesion is contained within the ROI outlined by the 
red rectangle). Fig. |7Jb) presents one slice of the 3D MAP label vector obtained with the 
proposed method. The MMSE estimate of the granularity coefficient corresponding to this 
result is $ = 1.020 ± 0.007. To assess the influence of 0, Figs. ^Kg) show the MAP 
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20 40 60 80 100 




20 40 60 



(a) Multilook SAR Image 



(b) Labels = 1.62) 





(c) Labels 09=1) 



(d) Optical Image of Toulouse 



Fig. 6. Pixel classification in a multilook SAR image (c). MAP labels when (3 is estimated (d) and (3 — 1 (e). 
Figs, (a)-(b) provide optical images of the same region. 



class labels obtained with the algorithm proposed in [16| for different values of (3. These 
results have been computed using K = 4 since the region of interest (ROI) contains 3 types 
of healthy skin layers (epidermis, papillary dermis and reticular dermis) in addition to the 
lesion. Labels have been computed from a single Markov chain of T = 12 000 iterations 
whose first 2 000 iterations (burn-in period) have been removed. 

We observe that the proposed method produces a very clear segmentation that not only 
sharply locates the lesion but also provides realistic boundaries for the healthy skin layers 
within the ROI. This result indicates that the lesion, which is known to have originated at the 
dermis-epidermis junction, has already invaded the upper half of the papillary dermis. We also 
observe that the results obtained by fixing to a small value are corrupted by ultrasound 
speckle noise and fail to capture the different skin layers. On the other hand, choosing a 
too large value of (3 enforces excessive spatial correlation and yields a segmentation with 
artificially smooth boundaries. Finally, Fig. [8] shows a frontal viewpoint of a 3D reconstruction 
of the lesion surface. We observe that the tumor has a semi-ellipsoidal shape which is cut at 
the upper left by the epidermis-dermis junction. The tumor grows from this junction towards 
the deeper dermis, which is at the lower right. 
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(a) Dermis view with skin lesion (ROI = 160 x 175 x 16). 



ft = 1.0197 

»^ * , J 

20 40 60 80 100 120 140 160 



(b) (Estimated (3) 





20 40 60 BO 100 120 



(C) 08 = 0.5) 



20 40 60 80 100 120 140 



(d) 03 = 0.75) 



20 40 60 80 100 120 140 160 



(e) 03 = 1.0) 





20 40 60 80 100 120 140 



(f) 09 = 1-25) 



20 40 60 BO 100 120 140 160 



(g) 08 = 1.5) 



Fig. 7. Log-compressed US images of skin lesion and the corresponding estimated class labels (lesion = black, 
epidermis = white, pap. dermis = dark gray, ret. dermis = light gray). MAP estimates of the class labels. Fig. 
(b) shows the results obtained r with the proposed method. Figs, (c)-(g) show the results obtained with the 
algorithm JT6) for f3 = (0.5, 0.75, 1, 1.25, 1.5). 



VII. Concluding Remarks 

This paper presented a hybrid Gibbs sampler for estimating the Potts parameter (3 jointly 
with the unknown parameters of a Bayesian model. In most image processing applications 
this important parameter is set heuristically by cross-validation. Standard MCMC methods 
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Fig. 8. Frontal viewpoint of a 3D reconstruction of the skin lesion. 



cannot be applied to this problem because performing inference on (3 requires computing 
the intractable normalizing constant of the Potts model. In this work the estimation of (3 has 
been included within an MCMC method using an ABC likelihood-free Metropolis-Hastings 
algorithm, in which intractable terms have been replaced by simulation-rejection schemes. 
The ABC distance function has been defined using the Potts potential, which is the natural 
sufficient statistic of the Potts model. The proposed method can be applied to large images 
both in 2D and in 3D scenarios. Experimental results obtained for synthetic data showed 
that estimating (3 jointly with the other unknown parameters leads to estimation results that 
are as good as those obtained with the actual value of (3. On the other hand, choosing 
an incorrect value of (3 can degrade the estimation performance significantly. Finally, the 
proposed algorithm was successfully applied to real bidimensional SAR and tridimensional 
ultrasound images. This study assumed that the number of classes K is known. Future works 
could relax this assumption by studying the estimation of (3 within a reversible jump MCMC 



algorithm p9[ or by considering model choice ABC methods fl49J . Other perspectives for 



future work include the estimation of the total variation regularization parameter in image 



restoration problems [ 60 1 and the estimation of texture descriptors defined through Markov 
fields Q. 
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